UniBic: Sequential row-based biclustering algorithm for analysis of gene expression data.
نویسندگان
چکیده
Biclustering algorithms, which aim to provide an effective and efficient way to analyze gene expression data by finding a group of genes with trend-preserving expression patterns under certain conditions, have been widely developed since Morgan et al. pioneered a work about partitioning a data matrix into submatrices with approximately constant values. However, the identification of general trend-preserving biclusters which are the most meaningful substructures hidden in gene expression data remains a highly challenging problem. We found an elementary method by which biologically meaningful trend-preserving biclusters can be readily identified from noisy and complex large data. The basic idea is to apply the longest common subsequence (LCS) framework to selected pairs of rows in an index matrix derived from an input data matrix to locate a seed for each bicluster to be identified. We tested it on synthetic and real datasets and compared its performance with currently competitive biclustering tools. We found that the new algorithm, named UniBic, outperformed all previous biclustering algorithms in terms of commonly used evaluation scenarios except for BicSPAM on narrow biclusters. The latter was somewhat better at finding narrow biclusters, the task for which it was specifically designed.
منابع مشابه
Application of Cardinality based GRASP to the Biclustering of Gene Expression Data
Biclustering algorithms perform simultaneous row and column clustering of a given data matrix. In gene expression dataset a bicluster is a subset of genes that exhibit similar expression patterns through a subset of conditions. Biclustering is a useful data mining technique for identifying local patterns from gene expression data. In this paper biclusters are identified in two steps. In the fir...
متن کاملBiclustering Gene Expressions Using Factor Graphs and the Max-Sum Algorithm
Biclustering is an intrinsically challenging and highly complex problem, particularly studied in the biology field, where the goal is to simultaneously cluster genes and samples of an expression data matrix. In this paper we present a novel approach to gene expression biclustering by providing a binary Factor Graph formulation to such problem. In more detail, we reformulate biclustering as a se...
متن کاملRecent patents on biclustering algorithms for gene expression data analysis.
In DNA microarray experiments, discovering groups of genes that share similar transcriptional characteristics is instrumental in functional annotation, tissue classification and motif identification. However, in many situations a subset of genes only exhibits a consistent pattern over a subset of conditions. Although used extensively in gene expression data analysis, conventional clustering alg...
متن کاملRandomized Algorithmic Approach for Biclustering of Gene Expression Data
Microarray data processing revolves around the pivotal issue of locating genes altering their expression in response to pathogens, other organisms or other multiple environmental conditions resulted out of a comparison between infected and uninfected cells or tissues. To have a comprehensive analysis of the corollaries of certain treatments, deseases and developmental stages embodied as a data ...
متن کاملA New Strategy of Geometrical Biclustering for Microarray Data Analysis
In this paper, we present a new biclustering algorithm to provide the geometrical interpretation of similar microarray gene expression profiles. Different from standard clustering analyses, biclustering methodology can perform simultaneous classification on the row and column dimensions of a data matrix. The main object of the strategy is to reveal the submatrix, in which a subset of genes exhi...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Scientific reports
دوره 6 شماره
صفحات -
تاریخ انتشار 2016